Measurement invariance of the strengths and difficulties questionnaire across socioeconomic status and ethnicity from ages 3 to 17 years: A population cohort study

Mental health inequalities along ethnic and socioeconomic groupings are well documented. The extent to which these observed inequalities are genuine or reflect measurement differences is unclear. In the current study we sought to investigate this in a large population-based sample of children and adolescents in the United Kingdom. The main objective of the study was to establish whether the parent-report Strengths and Difficulties Questionnaire (SDQ) was invariant across ethnicity and socioeconomic status groupings at six time points from 3 to 17 years (maximum N = 17,274). First, we fitted a series of confirmatory factor analysis models to the data and confirmed that the five-factor structure (emotional problems; peer problems; conduct problems; hyperactivity/inattention; and prosocial behaviour) had acceptable fit at ages 5, 7, 11, and 14 years. Next, we tested configural, metric, and scalar invariance at these time points and demonstrated scalar invariance across household income, parent highest education, and ethnicity categories. The five-factor structure did not fit well at ages 3 and 17 years; therefore invariance was not tested at these ages. These findings suggest the parent-report SDQ can be used to measure socioeconomic and ethnic inequalities in mental health from ages 5–14 years but more consideration is required outside these ages.


Introduction
Identifying mental health difficulties during childhood and adolescence is important for providing support to those affected. Whilst mean differences between socioeconomic and ethnic categories are well documented [e.g., 1], the challenge of consistently measuring mental health difficulties between groups remains; it is unclear whether existing measures assess the same constructs across groups comparably. This raises further questions about whether comparing group means appropriately assesses mental health inequalities rather than reflecting measurement artefacts. Standardised psychopathology assessments may show systematic socioeconomic and ethnic biases. Many measures were developed in White populations and may not reflect symptom profiles common in ethnically diverse backgrounds [2]. Similarly, children and adolescents from privileged socioeconomic backgrounds may display different symptom profiles compared to those from less privileged backgrounds [3] or their parents might identify and report difficulties differently. To increase understanding of potential measurement differences, measurement invariance was tested across different levels of socioeconomic status and ethnic groups in a nationally representative sample children and adolescents in the United Kingdom (UK) at multiple ages from 3 to 17 years.

The strengths and difficulties questionnaire
We focussed on the parent-report Strengths and Difficulties Questionnaire [SDQ; 4]; a widely used screening tool for psychopathology in children and adolescents. The SDQ contains 25 items forming five subscales each with five statements rated as 'Not true', 'Somewhat true' and 'Certainly true'. Four of the five subscales capture difficulties: emotional problems, peer problems, conduct problems, and hyperactivity/inattention. The fifth subscale captures strengths: prosocial behaviour.
The SDQ may be scored as: a single total difficulties score (summing the four problems subscales); two difficulties scores -internalising (summing emotional and peer problems) and externalising (summing conduct problems and hyperactivity/inattention); or four difficulties scores (emotional problems, peer problems, conduct problems, and hyperactivity/inattention). The prosocial behaviour subscale is usually scored separately. The four-subscale structure corresponds most closely to discrete symptoms of psychiatric conditions; emotional problemsdepression and anxiety, conduct problems-conduct disorder, and hyperactivity/inattentionattention deficit hyperactivity disorder. The peer problems subscale identifies social isolation and peer related interpersonal difficulties rather than symptoms of discrete psychiatric conditions.
The factor structure of different SDQ scoring methods has been assessed extensively. These include a three-factor structure [5], a five-factor factor structure [6], and bi-factor and second order structures [7]. Direct comparisons of the various structures in the present study sample [8] and others [7,9,10], suggest the five-factor structure fits best and, therefore, was tested in the present study.

Socioeconomic and ethnic inequalities in mental health difficulties
Socioeconomic and ethnic inequalities in child and adolescent psychopathology are frequently reported. Adolescents from South Asian or Black ethnic groups in the UK experience higher levels of mental health difficulties compared to their White counterparts [11]. Other studies report that ethnic minorities might have fewer mental health difficulties compared to their White peers [12,13]. These reported ethnic inequalities might also be sensitive to developmental periods. In one study, ethnic minorities in the UK had more mental health difficulties in early childhood than their White counterparts but by early adolescence ethnic minorities had fewer difficulties [14]. Such reported inequalities are usually explained in two ways; a consequence of belonging to the ethnic group itself (e.g., cultural practices) and or as a result of social, educational, or economic disadvantage experienced by these groups [15]. Indeed, evidence suggests that some, but not all, of the reported ethnic inequalities in mental health difficulties can be explained by socioeconomic status [16].
Socioeconomic status itself is an independent predictor of child and adolescent psychopathology. The effects of socioeconomic status on mental health difficulties are likely to manifest through various pathways. For example, children from low socioeconomic backgrounds have less access to material and social resources to support social, emotional, and cognitive development, which is in turn linked to the development of psychopathology [17]. Children from low socioeconomic backgrounds are also more likely to experience trauma [18], which is associated with poor mental health [19]. Additionally, socioeconomic pressures may negatively affect parent-child relationships, which have a knock-on effect on development [20,21]. There is, therefore, consistent evidence of a measured difference in mental health difficulties across socioeconomic and ethnic groups.
What remains unclear, however, is the extent to which measurement characteristics contribute to the observed differences. To illustrate, it may be that individuals from certain ethnic minorities or those from lower socioeconomic backgrounds have lower levels of reading ability or language proficiency [22] and so are less able to understand questionnaire items. Similarly, even if well understood, the instruments used to measure mental health difficulties may measure different constructs across groups. That is, even if the same items are being used to measure difficulties across groups, the likelihood of certain items being endorsed may depend on belonging to certain groups. For example, in one study, Black individuals were less likely to endorse questions on worthlessness and suicidal ideation compared to White individuals with the same level of depression [23]. Therefore, before comparing mental health difficulties between socioeconomic and ethnic groups, it is necessary to establish construct invariance.

Construct invariance
Construct invariance (or construct equivalence) refers to whether a psychological construct has the same meaning between groups [24]. In this case, whether mental health difficulties as measured using the SDQ items hold the same meaning across different socioeconomic and ethnic groups. Construct invariance can be categorised and assessed at three hierarchical levels [25]. Configural invariance assesses whether factor structure is equivalent across groups, i.e., similarity of the number of factors and pattern of factor loadings across groups. Metric invariance assesses whether, in addition to configural invariance, the magnitude of factor loadings is equivalent across groups. Practically, the highest level of invariance is scalar invariance, which refers to whether, in addition to metric invariance, thresholds are equivalent between groups. This means that individuals in comparison groups who are on equivalent levels of the latent factor will have equivalent scores on the indicator variables i.e., the comparison groups are interpreting the responses to the indicator variables in the same way [26]. Therefore, achieving scalar invariance would imply that any cross-group differences in mean SDQ scores are due to genuine differences in mental health difficulties rather than group-related differences in how the measure is understood and completed.

The current study
Longitudinal and gender invariance during childhood and adolescence were recently established for the parent-report SDQ using data from the Millennium Cohort Study [27], which will also be used in the present study. To our knowledge, SDQ measurement invariance across socioeconomic and ethnic categories across childhood and adolescence in the UK has not been fully investigated. In one study, researchers investigated measurement invariance between British Indian children and their White peers [28]. They found that the parent-, teacher-, and child-report SDQs were invariant for British Indian and White children. The focus on British Indian children, however, does not provide any indication of whether the measure is suitable to assess inequalities across a range of other ethnic minority groups in the UK. This is important considering that there are many more minority ethnic groups in the UK of which British Indian families tend to be the least disadvantaged of the UK ethnic minorities on a range of indicators [29]. In another study, researchers investigated the factor structure of the child-report SDQ across seven different countries [30] but did not include the UK. In a further UK-based study, researchers tested the factor structure of the parent-report SDQ in a multi-ethnic UK cohort [31]. They were unable to demonstrate a consistent factor structure across ethnic groups, but their relatively small sample consisted of 3-5 year olds disproportionally representing one ethnic minority, Pakistani. To the best of our knowledge, no previous study has investigated invariance based on multiple indictors of socioeconomic status (i.e., income and education). We addressed these gaps in previous research.
We sought to assess construct invariance across socioeconomic and ethnic groups in a large UK representative sample at multiple ages through childhood and adolescence. We were motivated by two research questions. First, to what extent does the five-factor structure of the SDQ (emotional problems, peer problems, conduct problems, hyperactivity/inattention, prosocial behaviour) fit the parent-report data across early childhood and adolescence at ages 3, 5, 7, 11, 14 and 17 years? (research question 1). Based on previous research in this sample [32], we expected the five factor structure to have at least adequate fit at ages 5-14 years but not at ages 3 and 17 years. Second, to what extent is there socioeconomic-and ethnicity-based measurement invariance for the SDQ during early childhood and adolescence? (research question 2). Given the paucity of previous research assessing measurement invariance across these groups we had no hypothesis about whether conditions of invariance will be met.

Ethical approval
The study was performed in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments. Ethical approval for data collection for Millennium Cohort Study [MCS,33] was granted by the National Health Service Research Ethics Committee. Participants provided written consent. Full details of the ethical process for the MCS are available at https://cls.ucl.ac.uk/wp-content/uploads/2017/07/MCS-Ethical-Approval-and-Consent-2019.pdf.

Study design
The study was a secondary analysis of existing data from a prospective cohort study. Our analysis was cross-sectional as we fitted models separately at each time point rather than including longitudinal relationships between variables.

Participants
The MCS [33] is a multi-disciplinary longitudinal study following approximately 19,000 UK children born between 2000-2002. Full sampling details can be found elsewhere [34 and https://cls.ucl.ac.uk/cls-studies/millennium-cohort-study/]. Briefly, child benefit records, which was a universal social security benefit for families with at least one child, were screened to identify eligible families. Families first participated when their child was 9 months old and were followed-up when the child was 3, 5, 7, 11, 14 and 17 years of age. Researchers were trained to administer surveys and they conducted interviews in family homes at each wave of data collection.
We used data from 17,274 participants (49% female) in our analyses. A breakdown of the sample by socioeconomic and ethnicity indicators is provided in Table 1. Participants were included if parent-report SDQ data was available for at least one data collection point. Some families had more than one child taking part in the study. To avoid nesting effects only one child per family (selected at random) was included.

SDQ
The parent-report SDQ [4] was completed by a selected caregiver (the mother for >95% of the sample) at each contact. As described previously, the SDQ is a 25-item questionnaire consisting of five subscales each with five statements. The caregiver indicated the extent to which each statement described their child over the last six months on a three-point scale (0 = not true, 1 = somewhat true, 2 = certainly true). Items contained in each subscale are listed in the supporting information.

Ethnicity and socioeconomic status measures
Household income. Primary caregivers reported income from all sources (government benefits, employment etc.) when the child was three years old. If age three data was missing, then responses collected when the child was nine months old were used. The Organisation for Economic Co-operation and Development modified scale was used to standardise overall household income [35]. This was then used to create quintiles (1 = lowest income, 5 = highest income).
Parent highest education. Primary caregivers reported their highest education level when the child was three years old (or age nine months, if data were missing at age three). These were converted into five categories (1 = O-Level / GCSE grades D-G, 2 = O-Level / GCSE grades Caribbean, Asian Indian, Asian Pakistani, Asian Bangladeshi, and Other. The first set of models were fitted models using these ethnicity groupings. However, some models did not converge due to small participant numbers in some groups (e.g., there were only 91 Black Caribbean adolescents at age 17 years). Subsequently, we reduced these groupings to five categories by combining Black African and Black Caribbean to create a single Black category and Asian Indian, Asian Pakistani, and Asian Bangladeshi into a single South Asian category.
To address research question 1, confirmatory factor analysis models were fitted separately at age 3, 5, 7, 11, 14, and 17 years. We tested the commonly used five factor structure of the SDQ; each of the 25 items were loaded on to one of the five latent factors; emotional problems, peer problems, conduct problems, hyperactivity/inattention, and prosocial behaviour as has been previously described [32]. The fit indices of the confirmatory five-factor models were assessed to determine how well the models fit the data. Models with satisfactory fit indices (see below) were considered to fit the data well and deemed appropriate for further analyses. We did not test alternative factor structures as previous work in this sample demonstrated that the five-factor structure fitted the data better than other structures [32]. SDQ items were treated as ordinal variables using the weighted least-squares means and variances-adjusted estimator. Model fit was considered adequate where comparative fit index [CFI,38] values were >0.90 and root mean square error of approximation [RMSEA, 39] values were <0.06 [40].
To address research question 2, measurement invariance was tested at ages demonstrating adequate model fit (i.e., ages 5, 7, 11, and 14 years). The aim of our analyses was not to test longitudinal measurement invariance; this has already been demonstrated in this sample [32] and others [27]. Instead, we were specifically concerned with between-group invariance at each of the time points separately. Doing this allowed us to determine the extent to which group comparisons along socioeconomic and ethnicity groupings reflect genuine group differences. Models were fitted using the Mplus command "model: configural metric scalar". This command fits three nested models. In the configural model, the five-factor structure was imposed without constraining factor loadings or thresholds. The metric model fixed factor loadings equal across all categories. The scalar model fixed both factor loadings and thresholds equal across categories. Non-invariance was assumed when there was a change in CFI of � -.010 combined with either a change in RMSEA of � .015 or a change in standard mean root residual (SRMR) of � .030, as described in previous work [25]. At each age we separately tested invariance across household income, parental highest education, and ethnicity.
A sample and non-response weight [41] was used in the invariance analysis to account for sample attrition and oversampling of ethnic minority and low-income households.

Missing data
There were missing data at each of the six time points. Relative to the original sample, response rates were 81% (age 3), 79% (age 5), 72% (age 7), 69% (age 11), 61% (age 14) and 74% (age 17). We did not fit longitudinal models and so do not report missingness at each time point as a function of the overall sample for the SDQ variables. There was missing data for ethnicity (7 observations, 0%), household income (51 observations, 0%), and parental highest education (4,245 observations, 25%), which was assumed to be missing at random. Missing highest education data was dependent on ethnicity and household income. Missing data was handled using the weighted least squares mean estimator in Mplus.

Descriptive statistics
Both the overall and stratified means (by socioeconomic and ethnicity indicators) of the SDQ subscales at each timepoint are shown in S1-S5 Tables in S1 File. Whilst we did not test change over time, we describe general trends in means here. Regarding overall means, there was a general trend of emotional difficulties and peer problems increasing between ages 3 and 17 years. Conversely, conduct problems and hyperactivity/inattention decreased between the ages 3 and 17 years. Prosocial behaviour increased between ages 3 and 11 years after which there was a decrease.
For both socioeconomic indicators, household income and parental highest education, there was a clear pattern: each SDQ problem subscale (emotional difficulties, peer problems, conduct problems, hyperactivity/inattention) decreased with each quintile increase in household income or parent highest education across all timepoints. In contrast, prosocial behaviour increased with each quintile increase in household income or parent highest education at each of the six time points.
For ethnicity, there was no clear pattern. Generally, Black or White children had the fewest difficulties while Mixed, South Asian, and Other ethnicity children had the most difficulties; but this varied by age and the SDQ subscale.

Internal consistency
The Cronbach's alphas for the SDQ subscales ranged from .47 to .71 at age 3 years, .52 to .77 at age 5 years, .58 to .79 at age 7 years, .63 to.79 at age 11 years, .62 to.76 at age 14 years, and .61 to .76 at age 17 years (S6 Table in S1 File). The Cronbach's alphas tended to increase with age. The peer problems subscale generally had the lowest Cronbach's alphas (except at 11 years) while the hyperactivity subscale had the highest Cronbach's alphas per timepoint.
The McDonald's omega values followed a similar pattern to the Cronbach's alphas (see S6 Table in S1 File), ranging from .47 to .72 at age 3 years, .52 to .77 at age 5 years, .59 to .79 at age 7 years, .66 to .79 at age 11 years, .64 to .78 at age 14 years, and .63 to .77 at age 17 years. The McDonald's omega values were also generally lowest for peer problems and highest for hyperactivity within each time point.

Factor structure
The five-factor structure of the SDQ was tested using a series of CFAs; one at each age. The model fit statistics for these CFAs are shown in Table 2. The five-factor structure had adequate fit at ages 7, 11, and 14 years. All CFI values were >.90 and all RMSEA values were < .06. At age 5 years the CFA was .899. Given that the RMSEA was < .06, the CFI was very close to the threshold and consistent with the recommendation to consider multiple fit indices (Kline, 2016), the fit at age 5 years was also considered adequate. These results were consistent with previous research using the present and other samples [27,32]. The five-factor structure did not fit well at ages 3 and 17 (CFIs < .90). Therefore, we focussed invariance analyses on ages 5-14. Table 3 shows that configural invariance was demonstrated across household income quintiles, with acceptable configural model fits at ages between 5-14 years. Metric invariance was achieved at each timepoint; constraining factor loadings to be equal across household income quintiles resulted in the CFIs, RMSEAs, and SRMRs either improving or decreasing within acceptable limits as described in the statistical analysis section. Similarly, scalar invariance was achieved at each age; further constraining thresholds across household income quintiles also resulted in CFIs, RMSEAs, and SRMRs, either improving or not decreasing beyond pre-specified acceptable limits.

Measurement invariance
Measurement invariance testing across parental education (Table 4) and ethnicity (Table 5) showed similar results; configural, metric and scalar invariance models demonstrated acceptable fit at all timepoints allowing scalar invariance to be accepted.

Discussion
We tested socioeconomic-and ethnicity-based invariance of the parent-report SDQ from age 3 to 17 years in a large representative UK population sample. Achieving invariance across socioeconomic and ethnicity groupings is critical for making meaningful comparisons across these groups. Without this, such comparisons may represent measurement differences rather than genuine differences. The results of our study demonstrate scalar invariance for household income, parent highest education, and ethnicity categories for parent-report SDQ between ages 5 and 14 years. This suggests that the parent-report SDQ can be used to meaningfully compare inequalities in mental health across socioeconomic and ethnicity indicators from ages 5 to 14 years. These findings have implications for SDQ users in multicultural settings like the UK. Many population-based studies, including the MCS, use the SDQ to screen for psychopathology in samples from multiple ethnic and socioeconomic backgrounds within the same geographical locations. The practical implications of our findings is that they provide confidence in previous work using parent-report SDQ to compare mental health difficulties across socioeconomic [e.g., 14] and multiple ethnicity groupings [e.g., 12]. In other words, the parent-reported SDQ is a valid instrument for comparing mental health difficulties in young people from different socioeconomic and ethnic backgrounds. Along with recent research demonstrating longitudinal invariance using the present sample [32], our findings suggest that the parent-report SDQ can be further used to investigate how socioeconomic-and ethnicity-based inequalities in mental health develop longitudinally during childhood and adolescence. Building on this, the SDQ can subsequently be used to map changes in mental health difficulties during development and how these differ across socioeconomic and ethnic groups using alternative approaches such as growth curve modelling. Such an approach could further justify the investigation of developmentally sensitive periods in emerging inequalities amongst different socioeconomic and ethnic groups using an invariant instrument such as the SDQ.
However, our findings do not support comparisons across socioeconomic and ethnic groupings outside the 5 to 14 years age range. Consistent with previous analyses in this sample [32], configural invariance was not achieved at ages 3 and 17 years, meaning that the parentreport SDQ data did not fit the five-factor structure. Theoretically, these findings might imply that in early childhood, symptoms of mental health difficulties manifest differently than during later childhood and adolescence; or that parent-report SDQ at this age has a different factor structure compared to later assessments. The lack of configural invariance for the five-structure factor at age 17 years seems intuitive. One possible explanation might be that by this age, adolescents spend more time away from their parents (e.g., with friends) and so parents might be less aware of the difficulties their child might be experiencing. Adolescents at this age are able to self-report symptoms of mental health difficulties, in contrast to early childhood. An important implication of the absence of socioeconomic and ethnic invariance for the parentreported SDQ at ages 3 and 17 years is that further investigations are needed to understand whether alternative factor structures of the SDQ or other measures of psychopathology are invariant across socioeconomic and ethnic groupings outside the 5-14-year age group.
A number of strengths and weaknesses that should be considered when interpreting these findings. A major strength is the use of a large, longitudinal, and representative sample which incorporated participants from diverse backgrounds. Specifically, families from low socioeconomic backgrounds and ethnic minorities were oversampled, and this provided sufficient power for comparative analyses. Data from a prospective cohort study also allowed for testing invariance at multiple time points. Thus, while we did not test longitudinal invariance (this having been recently demonstrated in the present sample [32]), we demonstrated that socioeconomic and ethnic invariance were consistent at multiple timepoints during development. These findings may not, however, hold outside of the UK, and will have to be tested separately. Additionally, the ethnic groupings that we used in our study may not be ideal as there is a lot of heterogeneity within the commonly used ethnic groupings of White, South Asian, Black, Mixed, and Other in the UK. For example, in the UK, some studies have suggested that Indian children tend to have a mental health advantage compared to Bangladeshi and Pakistani children [13]. Similarly, the Black grouping in our study might be problematic because it encompasses two culturally different groups (Black African and Black Caribbean), with substantial heterogeneity within these groups as well. Therefore, future research should consider these ethnic groups separately, where sample sizes allow. Our findings also need to be replicated both in the UK and other contexts in which there is a high socioeconomic and ethnic diversity.

Conclusions
We demonstrate configural, metric and scalar invariance for parent-report SDQ between ages 5-14 years for household income, parent education, and ethnicity. Therefore, the parentreport SDQ measures comparable mental health constructs in children and adolescents from different socioeconomic and ethnic groups in the UK. Our findings validate previous use of parent-report SDQ to investigate mental health disparities among young people from different ethnic and socioeconomic backgrounds. Our findings also support the utility of the parentreport SDQ in the future assessment of mental health inequalities across socioeconomic and ethnic groupings in the UK, and how these change uniquely within different socioeconomic and ethnic groups.
Supporting information S1 File. The supporting formation contains the SDQ items and the descriptive statistics and internal reliability measures for each SDQ subscale by age. (DOCX)